-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vulkan: improve im2col performance #11778
vulkan: improve im2col performance #11778
Conversation
I see reductions in performance on my devices, except some positive changes on A770: RTX 3090
AMD Radeon Pro VII
Intel A770
|
Thanks for the benchmarks, I just have my own GPU to test this with so I wasn't sure what to expect on any other. |
Yeah, it's always complicated with all the different hardware that supports Vulkan. If you don't find a way to improve it for all, and you can improve performance noticeably on your device at the cost of others, worst case we can add the shader alongside the other one and figure out on which devices to choose which variant. |
For now I don't think it's worth creating a separate case given the minor improvements (I've tried applying this change on stable-diffusion.cpp and I've just noticed some minimal improvements for large images). I'll keep learning more about vulkan shaders so I may find a way to improve this further. |
I've just pushed some changes to the code, with this the performance regressions on my GPU have all been solved.
I'll try to get access to other GPUs to test the performance on them. |
a432a63
to
a6b70d4
Compare
EDIT: Apparently I've completely forgotten to have the RADV experimental cswave32 support enabled on my system so I'll redo the benchmarks (the results now reflect the performance drop on the other GPUs). With these new changes the performance on my GPU increase even more. I'll be able to test the performance on an RTX 4060 in the next days.
|
I'll close this PR for now as I've been mistakenly optimizing the shader for an unsupported RADV mode (cswave32). I'll create a new PR in case I'm able to improve the shader without using any RADV_PERFTEST modes. |
What do you mean, unsupported? RDNA should be able to run waves of either 32 or 64. You can even control that behaviour from Vulkan. |
What I mean is I've been testing using using the enviromental variable RADV_PERFTEST=cswave32 which isn't currenly the default (I set up the variable for testing some time ago and forgot it enabled since it generally improves the performance on vulkan). Right now in this branch I've changed back the approach to having three different loops as the original code while moving some unnecessary operations out of the loops. This seems to increase the performance on my card while not having any visible regression. |
This PR tries to improve the performance of the im2col vulkan shader.
It's my first time working on a vulkan shader so any feedback is welcomed (I'm still not entirely convinced about the performance since there seems to be some small regressions).
Performance on
master
:PR
:ROCm
for comparison: